Overview

Dataset statistics

Number of variables11
Number of observations6362620
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory534.0 MiB
Average record size in memory88.0 B

Variable types

Numeric6
Categorical5

Warnings

nameOrig has a high cardinality: 6353307 distinct values High cardinality
nameDest has a high cardinality: 2722362 distinct values High cardinality
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
amount is highly correlated with oldbalanceDest and 1 other fieldsHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with amount and 1 other fieldsHigh correlation
newbalanceDest is highly correlated with amount and 1 other fieldsHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with oldbalanceOrgHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
type is highly correlated with newbalanceOrigHigh correlation
newbalanceOrig is highly correlated with type and 1 other fieldsHigh correlation
oldbalanceDest is highly correlated with newbalanceDestHigh correlation
oldbalanceOrg is highly correlated with newbalanceOrigHigh correlation
newbalanceDest is highly correlated with oldbalanceDestHigh correlation
amount is highly skewed (γ1 = 30.99394948) Skewed
nameOrig is uniformly distributed Uniform
oldbalanceOrg has 2102449 (33.0%) zeros Zeros
newbalanceOrig has 3609566 (56.7%) zeros Zeros
oldbalanceDest has 2704388 (42.5%) zeros Zeros
newbalanceDest has 2439433 (38.3%) zeros Zeros

Reproduction

Analysis started2021-09-24 18:33:34.929044
Analysis finished2021-09-24 18:41:29.199482
Duration7 minutes and 54.27 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

step
Real number (ℝ≥0)

Distinct743
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean243.3972456
Minimum1
Maximum743
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:29.332907image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile16
Q1156
median239
Q3335
95-th percentile490
Maximum743
Range742
Interquartile range (IQR)179

Descriptive statistics

Standard deviation142.331971
Coefficient of variation (CV)0.5847723161
Kurtosis0.329070555
Mean243.3972456
Median Absolute Deviation (MAD)92
Skewness0.3751768885
Sum1548644183
Variance20258.38998
MonotonicityIncreasing
2021-09-24T14:41:29.504771image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1951352
 
0.8%
1849579
 
0.8%
18749083
 
0.8%
23547491
 
0.7%
30746968
 
0.7%
16346352
 
0.7%
13946054
 
0.7%
40345155
 
0.7%
4345060
 
0.7%
35544787
 
0.7%
Other values (733)5890739
92.6%
ValueCountFrequency (%)
12708
 
< 0.1%
21014
 
< 0.1%
3552
 
< 0.1%
4565
 
< 0.1%
5665
 
< 0.1%
61660
 
< 0.1%
76837
 
0.1%
821097
0.3%
937628
0.6%
1035991
0.6%
ValueCountFrequency (%)
7438
 
< 0.1%
74214
< 0.1%
74122
< 0.1%
7406
 
< 0.1%
73910
< 0.1%
73810
< 0.1%
73710
< 0.1%
73614
< 0.1%
73512
< 0.1%
7348
 
< 0.1%

type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
CASH_OUT
2237500 
PAYMENT
2151495 
CASH_IN
1399284 
TRANSFER
532909 
DEBIT
 
41432

Length

Max length8
Median length7
Mean length7.422395963
Min length5

Characters and Unicode

Total characters47225885
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPAYMENT
2nd rowPAYMENT
3rd rowTRANSFER
4th rowCASH_OUT
5th rowPAYMENT

Common Values

ValueCountFrequency (%)
CASH_OUT2237500
35.2%
PAYMENT2151495
33.8%
CASH_IN1399284
22.0%
TRANSFER532909
 
8.4%
DEBIT41432
 
0.7%

Length

2021-09-24T14:41:29.817200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-24T14:41:29.926548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
cash_out2237500
35.2%
payment2151495
33.8%
cash_in1399284
22.0%
transfer532909
 
8.4%
debit41432
 
0.7%

Most occurring characters

ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter43589101
92.3%
Connector Punctuation3636784
 
7.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
P2151495
 
4.9%
Other values (7)7425297
17.0%
Connector Punctuation
ValueCountFrequency (%)
_3636784
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin43589101
92.3%
Common3636784
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
A6321188
14.5%
T4963336
11.4%
S4169693
9.6%
N4083688
9.4%
C3636784
8.3%
H3636784
8.3%
E2725836
 
6.3%
O2237500
 
5.1%
U2237500
 
5.1%
P2151495
 
4.9%
Other values (7)7425297
17.0%
Common
ValueCountFrequency (%)
_3636784
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII47225885
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A6321188
13.4%
T4963336
10.5%
S4169693
8.8%
N4083688
8.6%
C3636784
 
7.7%
H3636784
 
7.7%
_3636784
 
7.7%
E2725836
 
5.8%
O2237500
 
4.7%
U2237500
 
4.7%
Other values (8)9576792
20.3%

amount
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5316900
Distinct (%)83.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179861.9035
Minimum0
Maximum92445516.64
Zeros16
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:30.101481image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2224.0995
Q113389.57
median74871.94
Q3208721.4775
95-th percentile518634.1965
Maximum92445516.64
Range92445516.64
Interquartile range (IQR)195331.9075

Descriptive statistics

Standard deviation603858.2315
Coefficient of variation (CV)3.357343715
Kurtosis1797.956705
Mean179861.9035
Median Absolute Deviation (MAD)68393.655
Skewness30.99394948
Sum1.144392945 × 1012
Variance3.646447637 × 1011
MonotonicityNot monotonic
2021-09-24T14:41:30.273347image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000003207
 
0.1%
1000088
 
< 0.1%
500079
 
< 0.1%
1500068
 
< 0.1%
50065
 
< 0.1%
10000042
 
< 0.1%
2150037
 
< 0.1%
12000029
 
< 0.1%
13500020
 
< 0.1%
016
 
< 0.1%
Other values (5316890)6358969
99.9%
ValueCountFrequency (%)
016
< 0.1%
0.011
 
< 0.1%
0.023
 
< 0.1%
0.032
 
< 0.1%
0.041
 
< 0.1%
0.061
 
< 0.1%
0.071
 
< 0.1%
0.091
 
< 0.1%
0.11
 
< 0.1%
0.112
 
< 0.1%
ValueCountFrequency (%)
92445516.641
< 0.1%
73823490.361
< 0.1%
71172480.421
< 0.1%
69886731.31
< 0.1%
69337316.271
< 0.1%
67500761.291
< 0.1%
66761272.211
< 0.1%
64234448.191
< 0.1%
63847992.581
< 0.1%
63294839.631
< 0.1%

nameOrig
Categorical

HIGH CARDINALITY
UNIFORM

Distinct6353307
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C1462946854
 
3
C363736674
 
3
C1677795071
 
3
C400299098
 
3
C2098525306
 
3
Other values (6353302)
6362605 

Length

Max length11
Median length11
Mean length10.48232332
Min length5

Characters and Unicode

Total characters66695040
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6344009 ?
Unique (%)99.7%

Sample

1st rowC1231006815
2nd rowC1666544295
3rd rowC1305486145
4th rowC840083671
5th rowC2048537720

Common Values

ValueCountFrequency (%)
C14629468543
 
< 0.1%
C3637366743
 
< 0.1%
C16777950713
 
< 0.1%
C4002990983
 
< 0.1%
C20985253063
 
< 0.1%
C19995397873
 
< 0.1%
C19762081143
 
< 0.1%
C18325480283
 
< 0.1%
C20513594673
 
< 0.1%
C15305449953
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Length

2021-09-24T14:41:31.038798image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c15305449953
 
< 0.1%
c14629468543
 
< 0.1%
c4002990983
 
< 0.1%
c19995397873
 
< 0.1%
c5453151173
 
< 0.1%
c10653072913
 
< 0.1%
c7244528793
 
< 0.1%
c19023865303
 
< 0.1%
c16777950713
 
< 0.1%
c20513594673
 
< 0.1%
Other values (6353297)6362590
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60332420
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
Uppercase Letter
ValueCountFrequency (%)
C6362620
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common60332420
90.5%
Latin6362620
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
18803448
14.6%
26136135
10.2%
35699596
9.4%
45693146
9.4%
75669437
9.4%
55668010
9.4%
65667725
9.4%
05667074
9.4%
95665212
9.4%
85662637
9.4%
Latin
ValueCountFrequency (%)
C6362620
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII66695040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18803448
13.2%
C6362620
9.5%
26136135
9.2%
35699596
8.5%
45693146
8.5%
75669437
8.5%
55668010
8.5%
65667725
8.5%
05667074
8.5%
95665212
8.5%

oldbalanceOrg
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1845844
Distinct (%)29.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean833883.1041
Minimum0
Maximum59585040.37
Zeros2102449
Zeros (%)33.0%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:31.226253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median14208
Q3107315.175
95-th percentile5823702.278
Maximum59585040.37
Range59585040.37
Interquartile range (IQR)107315.175

Descriptive statistics

Standard deviation2888242.673
Coefficient of variation (CV)3.46360618
Kurtosis32.96487854
Mean833883.1041
Median Absolute Deviation (MAD)14208
Skewness5.249136421
Sum5.305681316 × 1012
Variance8.341945738 × 1012
MonotonicityNot monotonic
2021-09-24T14:41:31.384406image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02102449
33.0%
184918
 
< 0.1%
133914
 
< 0.1%
195912
 
< 0.1%
164909
 
< 0.1%
109908
 
< 0.1%
181908
 
< 0.1%
157902
 
< 0.1%
146899
 
< 0.1%
136898
 
< 0.1%
Other values (1845834)4252003
66.8%
ValueCountFrequency (%)
02102449
33.0%
0.051
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.441
 
< 0.1%
0.671
 
< 0.1%
1370
 
< 0.1%
1.021
 
< 0.1%
1.371
 
< 0.1%
1.381
 
< 0.1%
ValueCountFrequency (%)
59585040.371
< 0.1%
57316255.051
< 0.1%
50399045.081
< 0.1%
49585040.371
< 0.1%
47316255.051
< 0.1%
45674547.891
< 0.1%
44892193.091
< 0.1%
43818855.31
< 0.1%
43686616.331
< 0.1%
42542664.271
< 0.1%

newbalanceOrig
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct2682586
Distinct (%)42.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean855113.6686
Minimum0
Maximum49585040.37
Zeros3609566
Zeros (%)56.7%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:31.556238image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3144258.41
95-th percentile5980262.336
Maximum49585040.37
Range49585040.37
Interquartile range (IQR)144258.41

Descriptive statistics

Standard deviation2924048.503
Coefficient of variation (CV)3.419485164
Kurtosis32.06698456
Mean855113.6686
Median Absolute Deviation (MAD)0
Skewness5.176884001
Sum5.44076333 × 1012
Variance8.550059648 × 1012
MonotonicityNot monotonic
2021-09-24T14:41:31.729198image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
03609566
56.7%
9011.734
 
< 0.1%
7468.594
 
< 0.1%
8927.384
 
< 0.1%
4019.434
 
< 0.1%
7717.834
 
< 0.1%
36875.734
 
< 0.1%
7070.14
 
< 0.1%
10528.494
 
< 0.1%
7802.014
 
< 0.1%
Other values (2682576)2753018
43.3%
ValueCountFrequency (%)
03609566
56.7%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.051
 
< 0.1%
0.121
 
< 0.1%
0.131
 
< 0.1%
0.181
 
< 0.1%
0.211
 
< 0.1%
0.231
 
< 0.1%
0.31
 
< 0.1%
ValueCountFrequency (%)
49585040.371
< 0.1%
47316255.051
< 0.1%
43686616.331
< 0.1%
43673802.211
< 0.1%
41690842.641
< 0.1%
41432359.461
< 0.1%
40399045.081
< 0.1%
39585040.371
< 0.1%
38946233.021
< 0.1%
38939424.031
< 0.1%

nameDest
Categorical

HIGH CARDINALITY

Distinct2722362
Distinct (%)42.8%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
C1286084959
 
113
C985934102
 
109
C665576141
 
105
C2083562754
 
102
C1590550415
 
101
Other values (2722357)
6362090 

Length

Max length11
Median length11
Mean length10.48175201
Min length2

Characters and Unicode

Total characters66691405
Distinct characters12
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2262704 ?
Unique (%)35.6%

Sample

1st rowM1979787155
2nd rowM2044282225
3rd rowC553264065
4th rowC38997010
5th rowM1230701703

Common Values

ValueCountFrequency (%)
C1286084959113
 
< 0.1%
C985934102109
 
< 0.1%
C665576141105
 
< 0.1%
C2083562754102
 
< 0.1%
C1590550415101
 
< 0.1%
C248609774101
 
< 0.1%
C45111135199
 
< 0.1%
C178955025699
 
< 0.1%
C136076758998
 
< 0.1%
C102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Length

2021-09-24T14:41:32.276303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1286084959113
 
< 0.1%
c985934102109
 
< 0.1%
c665576141105
 
< 0.1%
c2083562754102
 
< 0.1%
c248609774101
 
< 0.1%
c1590550415101
 
< 0.1%
c178955025699
 
< 0.1%
c45111135199
 
< 0.1%
c136076758998
 
< 0.1%
c102371406597
 
< 0.1%
Other values (2722352)6361596
> 99.9%

Most occurring characters

ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number60328785
90.5%
Uppercase Letter6362620
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
Uppercase Letter
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring scripts

ValueCountFrequency (%)
Common60328785
90.5%
Latin6362620
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
18799996
14.6%
26133780
10.2%
35704404
9.5%
45691070
9.4%
85675627
9.4%
95668861
9.4%
75665128
9.4%
05664751
9.4%
65662897
9.4%
55662271
9.4%
Latin
ValueCountFrequency (%)
C4211125
66.2%
M2151495
33.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII66691405
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
18799996
13.2%
26133780
9.2%
35704404
8.6%
45691070
8.5%
85675627
8.5%
95668861
8.5%
75665128
8.5%
05664751
8.5%
65662897
8.5%
55662271
8.5%
Other values (2)6362620
9.5%

oldbalanceDest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3614697
Distinct (%)56.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1100701.667
Minimum0
Maximum356015889.4
Zeros2704388
Zeros (%)42.5%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:32.432519image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median132705.665
Q3943036.7075
95-th percentile5147229.713
Maximum356015889.4
Range356015889.4
Interquartile range (IQR)943036.7075

Descriptive statistics

Standard deviation3399180.113
Coefficient of variation (CV)3.088193846
Kurtosis948.6741254
Mean1100701.667
Median Absolute Deviation (MAD)132705.665
Skewness19.92175792
Sum7.003346437 × 1012
Variance1.155442544 × 1013
MonotonicityNot monotonic
2021-09-24T14:41:32.588730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02704388
42.5%
10000000615
 
< 0.1%
20000000219
 
< 0.1%
3000000086
 
< 0.1%
4000000031
 
< 0.1%
10221
 
< 0.1%
19819
 
< 0.1%
12518
 
< 0.1%
13218
 
< 0.1%
16018
 
< 0.1%
Other values (3614687)3657187
57.5%
ValueCountFrequency (%)
02704388
42.5%
0.011
 
< 0.1%
0.031
 
< 0.1%
0.131
 
< 0.1%
0.331
 
< 0.1%
0.371
 
< 0.1%
0.791
 
< 0.1%
17
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
ValueCountFrequency (%)
356015889.41
< 0.1%
355553416.31
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%
327852121.41
< 0.1%
327827763.41
< 0.1%

newbalanceDest
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct3555499
Distinct (%)55.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1224996.398
Minimum0
Maximum356179278.9
Zeros2439433
Zeros (%)38.3%
Negative0
Negative (%)0.0%
Memory size48.5 MiB
2021-09-24T14:41:32.776187image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median214661.44
Q31111909.25
95-th percentile5515715.903
Maximum356179278.9
Range356179278.9
Interquartile range (IQR)1111909.25

Descriptive statistics

Standard deviation3674128.942
Coefficient of variation (CV)2.999297751
Kurtosis862.1565079
Mean1224996.398
Median Absolute Deviation (MAD)214661.44
Skewness19.35230206
Sum7.794186583 × 1012
Variance1.349922348 × 1013
MonotonicityNot monotonic
2021-09-24T14:41:32.948020image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02439433
38.3%
1000000053
 
< 0.1%
971418.9132
 
< 0.1%
19169204.9329
 
< 0.1%
16532032.1625
 
< 0.1%
1254956.0725
 
< 0.1%
1412484.0922
 
< 0.1%
1178808.1421
 
< 0.1%
4743010.6721
 
< 0.1%
7364724.8421
 
< 0.1%
Other values (3555489)3922938
61.7%
ValueCountFrequency (%)
02439433
38.3%
0.011
 
< 0.1%
0.331
 
< 0.1%
1.391
 
< 0.1%
1.641
 
< 0.1%
1.741
 
< 0.1%
2.151
 
< 0.1%
2.451
 
< 0.1%
2.711
 
< 0.1%
2.761
 
< 0.1%
ValueCountFrequency (%)
356179278.91
< 0.1%
356015889.41
< 0.1%
355553416.32
< 0.1%
355381433.61
< 0.1%
355380483.51
< 0.1%
355185537.11
< 0.1%
328431698.21
< 0.1%
328194464.91
< 0.1%
327998074.21
< 0.1%
3279630241
< 0.1%

isFraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6354407 
1
 
8213

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Length

2021-09-24T14:41:33.241205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-24T14:41:33.334933image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring characters

ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06354407
99.9%
18213
 
0.1%

isFlaggedFraud
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size48.5 MiB
0
6362604 
1
 
16

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters6362620
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Length

2021-09-24T14:41:33.569253image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-24T14:41:33.662983image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number6362620
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common6362620
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6362620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
06362604
> 99.9%
116
 
< 0.1%

Interactions

2021-09-24T14:39:45.168205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:47.508554image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:49.878776image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:52.161414image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:54.494857image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:56.865772image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:39:59.149531image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:01.319423image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:03.496421image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:05.582530image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:07.884769image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:10.299619image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:12.800310image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:15.257220image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:17.576734image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:19.884506image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:22.252211image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:24.382227image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:26.642688image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:28.811604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:30.906800image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:32.946280image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:35.055333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:37.183196image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:39.431644image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:41.604057image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:43.714367image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:45.793244image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:47.870272image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:50.011645image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:52.263182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:54.450937image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:56.710032image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:40:58.798621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:41:00.930649image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-24T14:41:03.031846image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-24T14:41:33.741086image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-24T14:41:33.959786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-24T14:41:34.194621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-24T14:41:34.414524image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-24T14:41:34.617603image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-24T14:41:04.116512image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-24T14:41:08.026682image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

steptypeamountnameOrigoldbalanceOrgnewbalanceOrignameDestoldbalanceDestnewbalanceDestisFraudisFlaggedFraud
01PAYMENT9839.64C1231006815170136.00160296.36M19797871550.00.0000
11PAYMENT1864.28C166654429521249.0019384.72M20442822250.00.0000
21TRANSFER181.00C1305486145181.000.00C5532640650.00.0010
31CASH_OUT181.00C840083671181.000.00C3899701021182.00.0010
41PAYMENT11668.14C204853772041554.0029885.86M12307017030.00.0000
51PAYMENT7817.71C9004563853860.0046042.29M5734872740.00.0000
61PAYMENT7107.77C154988899183195.00176087.23M4080691190.00.0000
71PAYMENT7861.64C1912850431176087.23168225.59M6333263330.00.0000
81PAYMENT4024.36C12650129282671.000.00M11769321040.00.0000
91DEBIT5337.77C71241012441720.0036382.23C19560086041898.040348.7900

Last rows

steptypeamountnameOrigoldbalanceOrgnewbalanceOrignameDestoldbalanceDestnewbalanceDestisFraudisFlaggedFraud
6362610742TRANSFER63416.99C77807100863416.990.0C18125528600.000.0010
6362611742CASH_OUT63416.99C99495068463416.990.0C1662241365276433.18339850.1710
6362612743TRANSFER1258818.82C15313014701258818.820.0C14709985630.000.0010
6362613743CASH_OUT1258818.82C14361187061258818.820.0C1240760502503464.501762283.3310
6362614743TRANSFER339682.13C2013999242339682.130.0C18504239040.000.0010
6362615743CASH_OUT339682.13C786484425339682.130.0C7769192900.00339682.1310
6362616743TRANSFER6311409.28C15290082456311409.280.0C18818418310.000.0010
6362617743CASH_OUT6311409.28C11629223336311409.280.0C136512589068488.846379898.1110
6362618743TRANSFER850002.52C1685995037850002.520.0C20803885130.000.0010
6362619743CASH_OUT850002.52C1280323807850002.520.0C8732211896510099.117360101.6310